7,153 research outputs found
Prompted Opinion Summarization with GPT-3.5
Large language models have shown impressive performance across a wide variety
of tasks, including text summarization. In this paper, we show that this strong
performance extends to opinion summarization. We explore several pipeline
methods for applying GPT-3.5 to summarize a large collection of user reviews in
a prompted fashion. To handle arbitrarily large numbers of user reviews, we
explore recursive summarization as well as methods for selecting salient
content to summarize through supervised clustering or extraction. On two
datasets, an aspect-oriented summarization dataset of hotel reviews (SPACE) and
a generic summarization dataset of Amazon and Yelp reviews (FewSum), we show
that GPT-3.5 models achieve very strong performance in human evaluation. We
argue that standard evaluation metrics do not reflect this, and introduce three
new metrics targeting faithfulness, factuality, and genericity to contrast
these different methods.Comment: Accepted to ACL (Findings) 202
Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization
In long document controllable summarization, where labeled data is scarce,
pretrained models struggle to adapt to the task and effectively respond to user
queries. In this paper, we introduce Socratic pretraining, a question-driven,
unsupervised pretraining objective specifically designed to improve
controllability in summarization tasks. By training a model to generate and
answer relevant questions in a given context, Socratic pretraining enables the
model to more effectively adhere to user-provided queries and identify relevant
content to be summarized. We demonstrate the effectiveness of this approach
through extensive experimentation on two summarization domains, short stories
and dialogue, and multiple control strategies: keywords, questions, and factoid
QA pairs. Our pretraining method relies only on unlabeled documents and a
question generation system and outperforms pre-finetuning approaches that use
additional supervised data. Furthermore, our results show that Socratic
pretraining cuts task-specific labeled data requirements in half, is more
faithful to user-provided queries, and achieves state-of-the-art performance on
QMSum and SQuALITY.Comment: To appear at ACL 202
Generating EDU Extracts for Plan-Guided Summary Re-Ranking
Two-step approaches, in which summary candidates are generated-then-reranked
to return a single summary, can improve ROUGE scores over the standard
single-step approach. Yet, standard decoding methods (i.e., beam search,
nucleus sampling, and diverse beam search) produce candidates with redundant,
and often low quality, content. In this paper, we design a novel method to
generate candidates for re-ranking that addresses these issues. We ground each
candidate abstract on its own unique content plan and generate distinct
plan-guided abstracts using a model's top beam. More concretely, a standard
language model (a BART LM) auto-regressively generates elemental discourse unit
(EDU) content plans with an extractive copy mechanism. The top K beams from the
content plan generator are then used to guide a separate LM, which produces a
single abstractive candidate for each distinct plan. We apply an existing
re-ranker (BRIO) to abstractive candidates generated from our method, as well
as baseline decoding methods. We show large relevance improvements over
previously published methods on widely used single document news article
corpora, with ROUGE-2 F1 gains of 0.88, 2.01, and 0.38 on CNN / Dailymail, NYT,
and Xsum, respectively. A human evaluation on CNN / DM validates these results.
Similarly, on 1k samples from CNN / DM, we show that prompting GPT-3 to follow
EDU plans outperforms sampling-based methods by 1.05 ROUGE-2 F1 points. Code to
generate and realize plans is available at
https://github.com/griff4692/edu-sum.Comment: ACL 202
- …